feat(retry): 429 rate-limit retry + multi-provider integration validation#10
Merged
aksOps merged 4 commits intoMay 14, 2026
Merged
Conversation
Free / shared upstream tiers (e.g. OpenRouter ``…:free`` models)
throttle on short windows that need 30-60s to clear. The existing
5xx backoff (1.5s/3s/4.5s, total ~9s) exhausts retries before the
window opens again, surfacing the 429 as an EnvelopeMissingError
or a hard ``agent failed`` row.
Split ``_ainvoke_with_retry`` into two backoff regimes:
* 5xx + connection-reset markers: existing ``base_delay`` (1.5s)
→ 1.5s / 3.0s / 4.5s
* 429 / rate-limit markers: new ``rate_limit_base_delay`` (7.5s)
→ 7.5s / 15.0s / 22.5s (total ~45s before raising)
``_RATE_LIMIT_MARKERS`` covers the variants real providers emit:
``status code: 429``, ``error code: 429``, the bare ``" 429"`` /
``"429 "`` (with space-guard against false positives like 1429),
``ratelimiterror`` (langchain's exception class name), ``rate
limit`` / ``rate-limited``, and ``too many requests``.
Non-429 4xx (401 unauthorized, 422 schema validation, etc.) keep
their fast-fail behaviour — retrying a quota / auth / schema error
just wastes time and masks the real problem.
5 new tests in ``tests/test_ainvoke_retry_429.py``:
* ``test_retries_on_5xx_and_returns_eventually`` — pins the
short-backoff path stays at 1.5s.
* ``test_retries_on_429_with_longer_backoff`` — pins the 7.5s/15s
progression.
* ``test_429_phrasings_all_match`` — exercises every marker.
* ``test_non_transient_error_propagates_without_retry`` — fast-fail
on 401.
* ``test_429_exhausts_max_attempts_then_raises`` — bounded retry,
no infinite loop.
Suite: 1265 passed (was 1260 — added 5), ruff clean.
Two issues caught while live-validating v1.5-C against real
providers:
1. **Stale skill prompt.** The S1 driver's ``responder`` skill was
written in the Phase 15 (response_format JSON) era; its
system_prompt told the LLM "respond in one sentence" with no
markdown contract instructions. Phase 22 (markdown-primary turn
output) made that fail with ``EnvelopeMissingError`` because the
parser has nothing to lift. Add the
``## Response`` / ``## Confidence`` / ``## Signal`` contract
block to the prompt — same pattern as the production skill
prompts under ``examples/incident_management/skills/*/system.md``.
2. **No Azure parametrize arm.** The driver covered ``workhorse``
(OpenRouter) + ``local`` (Ollama). Azure has been first-class in
``runtime.llm.get_llm`` since Phase 13 but had no live verification
path. Add an ``azure`` arm parametrize that constructs an
``AzureChatOpenAI`` from ``AZURE_OPENAI_KEY`` + ``AZURE_ENDPOINT``
+ ``AZURE_DEPLOYMENT`` (defaults to ``gpt-4o``).
Per-leg skip semantics: each arm independently skips when its keys
are absent. Replaces the global ``pytestmark.skipif`` that required
ALL three keys for any leg to run — partial-key environments now
exercise whichever providers they can reach. Drops the
``_OPENROUTER_KEY and _OLLAMA_KEY and _OLLAMA_BASE_URL`` global
gate; the per-leg gate inside the test body owns it.
The ``LLMConfig`` builder also handles a fully-keyless environment
by falling through to a stub provider so config validation passes
during test collection.
Live verification status (with the keys in this dev environment):
* ``local`` — PASSES against Ollama Cloud gpt-oss:20b
* ``workhorse`` — fails on credit / rate-limit (account-specific)
* ``azure`` — fails on connection error (placeholder endpoint in
.env; framework path itself is intact)
…6-1t:free Demonstrates the v1.5-C per-agent provider story end-to-end with two REAL providers in flight: * intake (skill override) → Ollama Cloud gpt-oss:20b * triage / DI / resolution (default) → OpenRouter inclusionai/ring-2.6-1t:free The free OpenRouter tier rate-limits aggressively; the preceding ``feat(retry)`` commit's 429 backoff (7.5s/15s/22.5s) keeps multi-agent INC runs working through transient throttles. Operators on a paid OpenRouter plan should swap the model back to ``openai/gpt-4o-mini`` (or any other paid model) — the rest of the registry is unchanged.
Bundles dist/app.py + dist/apps/{code-review,incident-management}.py
in line with the ``runtime.graph._RATE_LIMIT_MARKERS`` +
``_ainvoke_with_retry`` rate-limit branch from the preceding feat
commit. No bundle-only edits.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
v1.5-C follow-up. Three improvements caught while live-validating the per-agent provider story (intake on Ollama, downstream on OpenRouter):
feat(retry)— Free / shared upstream tiers (e.g. OpenRouter…:freemodels) throttle on 30-60s windows. The existing 5xx backoff (1.5s/3s/4.5s) exhausts retries before the window clears, surfacing the 429 asEnvelopeMissingErrororagent failed. Added a separate_RATE_LIMIT_MARKERSset + longerrate_limit_base_delay(7.5s/15s/22.5s, total ~45s).test(integration)—tests/test_integration_driver_s1.pywas written in the Phase 15 (response_format JSON) era; itsresponderskill prompt missed the Phase 22 markdown contract → live Ollama call hard-failed withEnvelopeMissingError. Added the contract block to the prompt. Also added anazureparametrize arm so the live verification covers all three production provider kinds. Per-leg skip semantics — partial-key environments now exercise whichever providers they can reach.chore(config)— Switchllm.defaulttoworkhorseand pointworkhorseatinclusionai/ring-2.6-1t:free. Demonstrates the v1.5-C per-agent flow with two real providers in the same INC. Operators on a paid OpenRouter plan should swap back to a paid model.Changes
c638352_ainvoke_with_retrytwo-regime backoff + 5 new testsc8da2367d29cf0config/config.yamldefault → free OpenRouter model1df2072Test plan
uv run ruff check src/ tests/— cleanuv run pytest -x— 1265 passed, 8 skipped (was 1260, added 5)tests/test_integration_driver_s1.py::…[local]— PASSES end-to-end against Ollama Cloudgpt-oss:20bLive verification matrix (with this dev environment's
.env)local(Ollama Cloud)workhorse(OpenRouter free model)azure.envhas placeholderAZURE_ENDPOINT='noop…'; framework path itself constructsAzureChatOpenAIcleanly🤖 Generated with Claude Code